Re-Evaluating the Netflix Prize - Human Uncertainty and its Impact on Reliability

نویسندگان

  • Kevin Jasberg
  • Sergej Sizov
چکیده

In this paper, we examine the statistical soundness of comparative assessments within the €eld of recommender systems in terms of reliability and human uncertainty. From a controlled experiment, we get the insight that users provide di‚erent ratings on same items when repeatedly asked. Œis volatility of user ratings justi€es the assumption of using probability densities instead of single rating scores. As a consequence, the well-known accuracy metrics (e.g. MAE, MSE, RMSE) yield a density themselves that emerges from convolution of all rating densities. When two di‚erent systems produce di‚erent RMSE distributions with signi€cant intersection, then there exists a probability of error for each possible ranking. As an application, we examine possible ranking errors of the Netƒix Prize. We are able to show that all top rankings are more or less subject to high probabilities of error and that some rankings may be deemed to be caused by mere chance rather than system quality.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Toward More Diverse Recommendations: Item Re-ranking Methods for Recommender Systems

Recommender systems are becoming increasingly important to individual users and businesses for providing personalized recommendations. However, while the majority of algorithms proposed in recommender systems literature have focused on improving recommendation accuracy (as exemplified by the recent Netflix Prize competition), other important aspects of recommendation quality, such as the divers...

متن کامل

Bennett Netflix 100 Winchester Circle

INTRODUCTION The KDD Cup is the oldest of the many data mining competitions that are now popular [1]. It is an integral part of the annual ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). In 2007, the traditional KDD Cup competition was augmented with a workshop with a focus on the concurrently active Netflix Prize competition [2]. The KDD Cup itself in 2007 con...

متن کامل

The Ethics for Evauating with an Emphasis on the Quranic Teachings

Evaluating performances, particularly that of researchers, has been considered as one of the most important problems in the fields of education and research. In many cases, even one score in evaluating a work would lead to getting or missing a prize unjustly. There can be found some Quranic teachings in the field to solve the problem. Paying attention to personal rights of those being criticize...

متن کامل

The Netflix Prize

In October, 2006 Netflix released a dataset containing 100 million anonymous movie ratings and challenged the data mining, machine learning and computer science communities to develop systems that could beat the accuracy of its recommendation system, Cinematch. We briefly describe the challenge itself, review related work and efforts, and summarize visible progress to date. Other potential uses...

متن کامل

Matrix factorization for the Netflix Prize

I compare two common techniques to compute matrix factorizations for recommender systems, specifically using the Netflix prize data set. Accuracy, run-time, and scalability are discussed for stochastic gradient descent and non-linear conjugate gradient.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1706.08866  شماره 

صفحات  -

تاریخ انتشار 2017